Incremental Truncated LSTD
نویسندگان
چکیده
Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning. Temporal difference (TD) learning algorithms stochastically update the value function, with a linear time complexity in the number of features, whereas least-squares temporal difference (LSTD) algorithms are sample efficient but can be quadratic in the number of features. In this work, we develop an efficient incremental lowrank LSTD( ) algorithm that progresses towards the goal of better balancing computation and sample efficiency. The algorithm reduces the computation and storage complexity to the number of features times the chosen rank parameter while summarizing past samples efficiently to nearly obtain the sample efficiency of LSTD. We derive a simulation bound on the solution given by truncated low-rank approximation, illustrating a biasvariance trade-off dependent on the choice of rank. We demonstrate that the algorithm effectively balances computational complexity and sample efficiency for policy evaluation in a benchmark task and a high-dimensional energy allocation domain.
منابع مشابه
Finite element simulation of two-point incremental forming of free-form parts
Two-point incremental forming method is considered a modern technique for manufacturing shell parts. The presence of bottom punch during the process makes this technique far more complex than its conventional counterpart i.e. single-point incremental forming method. Thus, the numerical simulation of this method is an essential task, which leads to the reduction of trial/error costs, predicts th...
متن کاملIncremental Least-Squares Temporal Difference Learning
Approximate policy evaluation with linear function approximation is a commonly arising problem in reinforcement learning, usually solved using temporal difference (TD) algorithms. In this paper we introduce a new variant of linear TD learning, called incremental least-squares TD learning, or iLSTD. This method is more data efficient than conventional TD algorithms such as TD(0) and is more comp...
متن کاملiLSTD: Eligibility Traces and Convergence Analysis
We present new theoretical and empirical results with the iLSTD algorithm for policy evaluation in reinforcement learning with linear function approximation. iLSTD is an incremental method for achieving results similar to LSTD, the dataefficient, least-squares version of temporal difference learning, without incurring the full cost of the LSTD computation. LSTD is O(n), where n is the number of...
متن کاملTwo point incremental forming of a complicated shape with negative and positive dies
In this work, incremental sheet forming of a complicated shape is investigated experimentally. Two point incremental forming with negative and positive dies are employed for manufacturing of a complicated shape with positive and negative truncated cones. The material is aluminum alloy 3105 with a thickness of 1 mm. The effects of process parameters such as sequence of positive and negative form...
متن کاملProperties of the Least Squares Temporal Di erence learning algorithm
This paper focuses on policy evaluation using the well-known Least Squares Temporal Di erences (LSTD) algorithm. We give several alternative ways of looking at the algorithm: the operator-theory approach via the Galerkin method, the statistical approach via instrumental variables as well as the limit of the TD iteration. Further, we give a geometric view of the algorithm as an oblique projectio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016